Skip to content

feat(keypoint-detection): add COCO OKS-AP evaluation#949

Open
jeon185 wants to merge 1 commit into
feat/keypoint-detection-enablementfrom
feat/keypoint-detection-eval
Open

feat(keypoint-detection): add COCO OKS-AP evaluation#949
jeon185 wants to merge 1 commit into
feat/keypoint-detection-enablementfrom
feat/keypoint-detection-eval

Conversation

@jeon185

@jeon185 jeon185 commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Adds the eval stage for keypoint-detection (ViTPose), so the COCO-keypoint models from #284 now go through config -> build -> perf -> eval. Stacked on #905 (the config/build/perf enablement) - that one should go in first.

What's here:

  • metrics/keypoint.py - KeypointAPMetric. Computes the COCO keypoint score (OKS-based AP over 0.50:0.95) with pycocotools COCOeval, the same way object-detection already reuses the COCO mAP protocol.
  • keypoint_detection_evaluator.py - top-down evaluator. transformers has no keypoint-detection pipeline, so it runs the image processor and ONNX model directly: for each ground-truth person box it does preprocess -> model -> post_process_pose_estimation and scores against the GT keypoints. ViTPose is exported with a static batch of 1, so each person crop runs separately and the heatmaps are stacked back together for post-processing. It uses the GT person boxes (standard COCO top-down protocol - keeps the score about pose accuracy, not detection).
  • scripts/build_coco_keypoints.py - builds a local COCO val keypoints dataset. COCO has no script-free HF mirror for person keypoints, so this downloads the annotations once and fetches images individually, which means a small subset doesn't need the full image zip.
  • Schema, evaluator registry, default dataset, and unit tests for the metric and evaluator.

Verified on the five COCO 17-keypoint models (vitpose-base-simple and vitpose-plus-{small,base,large,huge}): config -> build -> perf -> eval all pass and return COCO AP/AR. AP rises with model size as you'd expect. Absolute numbers are on the low side right now because the build quantizes with random calibration data, but relative comparison holds.

synthpose-vitpose-huge-hf - not covered yet

This is the one model from #284 that this PR does not evaluate. It predicts 52 anatomical keypoints instead of COCO's 17, so it can't be scored against COCO ground truth - the keypoint sets don't line up and OKS is only defined when they do.

How it's handled for now: the metric checks the keypoint count up front and raises a clear, actionable error instead of failing with a numpy broadcast error deep inside pycocotools.

Idea for finishing it: KeypointAPMetric already takes sigmas and keypoint_names as arguments, so the main missing piece is a dataset with SynthPose's 52-keypoint ground truth plus the matching OKS sigmas. I'd rather agree on the dataset and sigmas in review before adding that - happy to land it in this PR or as a follow-up, whichever you prefer.

Refs #284.

Adds the eval stage for keypoint-detection (ViTPose), completing
config -> build -> perf -> eval for the COCO-keypoint models in #284.

- metrics/keypoint.py: KeypointAPMetric computes the COCO keypoint score
  (OKS-based AP over 0.50:0.95) via pycocotools COCOeval, the same way
  object-detection reuses the COCO mAP protocol.
- keypoint_detection_evaluator.py: top-down evaluator. transformers has no
  keypoint-detection pipeline, so it drives the image processor and ONNX model
  directly - per ground-truth person box it runs preprocess -> model ->
  post_process_pose_estimation and scores against GT keypoints. ViTPose exports
  a static batch of 1, so each person crop runs separately and the heatmaps are
  stacked for post-processing. Uses GT person boxes (standard COCO top-down,
  isolates pose accuracy from detection).
- scripts/build_coco_keypoints.py: builds a local COCO val keypoints dataset;
  downloads annotations once and fetches images individually so a subset does
  not need the full image zip.
- Schema, evaluator registry, default dataset, unit tests.

Verified on the five COCO 17-keypoint models (vitpose-base-simple,
vitpose-plus-{small,base,large,huge}): config -> build -> perf -> eval all
pass and return COCO AP/AR.

synthpose-vitpose-huge-hf is not covered yet. It predicts 52 anatomical
keypoints rather than COCO's 17, so it can't be scored against COCO ground
truth - the keypoint sets don't line up, and OKS is only defined when they do.
Right now the metric detects this mismatch and raises a clear error instead of
failing deep inside pycocotools. KeypointAPMetric already takes sigmas and
keypoint_names as arguments, so supporting SynthPose mainly needs a dataset
with its 52-keypoint ground truth plus the matching OKS sigmas; I'd rather
confirm the dataset/sigmas choice in review before adding that. Open to
suggestions on whether to land it here or as a follow-up.

Refs #284.
@jeon185 jeon185 requested a review from a team as a code owner June 23, 2026 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant